A new extension of Poisson distribution for asymmetric count data: theory, classical and Bayesian estimation with application to lifetime data

Several research investigations have stressed the importance of discrete data analysis and its relevance to actual events. The current work focuses on a new discrete distribution with a single parameter that can be derived using the Poisson mixing technique. The new distribution is named the Poisson Entropy-Based Weighted Exponential Distribution. It is useful for discussing asymmetric “right-skewed” data with “heavy” tails. Its failure rate function can be used to explain situations with increasing failure rates. The statistical properties of the new distribution are expressed explicitly. The proposed model is simple to manage for under-, equal-, and over-dispersed datasets. The model parameters are estimated using the maximum likelihood method. We consider the parameter estimation for the new model based on right-censored data with a cure fraction. One more focus of the present study is the Bayesian estimation of the model parameters. In the end, three real-world dataset examples were utilized to show the value of the new distribution. These applications revealed that the new model outperforms other standard discrete models.


INTRODUCTION
Numerous studies have emphasized the relevance of count data modeling which has aroused significant interest in a range of fields such as medical research, earth science, physics, economics, and insurance.Various lifetime probability distributions have been utilized and investigated in reliability theory.The Poisson distribution is commonly utilized to analyze the "symmetric" and "asymmetric" count datasets, but it cannot describe over-dispersed datasets.As a result, there has been a lot of interest in the discretization of continuous probability distributions.Several techniques may be used to obtain the discrete analog of a continuous probability distribution.The Poisson mixed approach gets great attention from researchers and is most commonly used for generalization or generation of new probability distributions.The Poisson mixed approach is discussed below.
If the Poisson parameter is a random variable with a parameterized distribution (P), then the resulting model is a discrete Poisson mixed model.The distribution P and its parameter vector Θ are referred to as prior distribution and hyperparameter, respectively.The resulting distribution of random variable X is stated as follows: where XjÃ is the Poisson distribution with parameter k as x! ; x ¼ 0; 1; 2; 3; . . .: (2) is a continuous density function and Ã is a random variable of the Poisson parameter k.
Al-Nasser, Rawashdeh & Talal (2022) introduced a new weighted exponential distribution.The resulting distribution is named entropy-based weighted exponential distribution (EBWED).Let X be a continuous random variable that follows EBWED with a single parameter b ð Þ.The probability density function (pdf) of EBWD will be.
The cumulative distribution function (cdf) of the EBWED is The innovation of this study is the derivation of a new Poisson mixed distribution for under, equal, and over-dispersed count datasets to address the above-mentioned issues.This study has the following goals; The main objective is to introduce a new flexible Poisson entropy-based weighted exponential distribution.The ensuing distribution is obtained by mixing Poisson with the entropy-based weighted exponential distribution.The moments and associated measures of the new distribution can be calculated analytically when compared to existing discrete distributions, and it has a strong modeling capability.The new model is also incredibly adaptable.The model parameter is estimated using the maximum likelihood estimation (MLE) method.A comprehensive simulation is performed to assess the behavior ML estimates.The new distribution is used to model "asymmetric" and "right skewed" data in the presence of complete and right-censored data.We also take into account censored data with a cure fraction.
The Bayesian estimation approach is also utilized to estimate the model parameter.
The rest of the document is structured as follows: The derivation of the new discrete probability model is presented in "The PEBWE Distribution"."Moments and Associated Measures" discusses its underlying mathematical characteristics."Parameter Estimation" discusses the maximum likelihood estimation for the distribution parameter using complete, censored, and censored data with cure fraction.This section also discusses Bayesian estimation using the MCMC approach.Three examples are given in "Application" to illustrate the adaptability of the new distribution.In the end, concluding remarks and some future directions are given in "Conclusion".

THE PEBWE DISTRIBUTION
The following proposition introduces a new mixed-Poisson model by combining the Poisson and Entropy-Based Weighted Exponential distributions.
Proposition 1. Suppose that X follows the compound Poisson-EBWE distribution (PEBWED), which has the following stochastic representation: where k and b > 0.Then, the pmf of X is given by The new model is denoted as PEBWEDðbÞ, and one can note X $ PEBWEDðbÞ to apprise that X follows that PEBWED with parameter b.
Proof.The pmf of X can be obtained using the common mixing method shown below.
The proof is completed.Figure 1 depicts the potential pmf plots of the proposed distribution.
Remark: The first derivative of pmf is For b .0:6934 the X is a critical point that maximizes the pð X; bÞ and 0 , b 0:6914 the pmf is a decreasing function of x. and Therefore, the mode of PEBWED is given by The cdf and survival function of the PEBWED is given by and The hazard function (hf) of the PEBWED is given by Proposition 2: The PEBWED hf increases as x increases.
Proof: Using the idea of Glaser (1980) and from the pmf of PEBWED As q 0 x ð Þ > 0, the hf of PEBWED is increasing function.Furthermore, the graphs in Fig. 2 pertain to the possible shapes of the PEBWED.

MOMENTS AND ASSOCIATED MEASURES
In this section, moments, probability generating function, moment generating function, and their associated measures, mean, variance, dispersion index, skewness, and kurtosis are derived and discussed.Proposition 3: The rth factorial moments of PEBWED are given by Proof: The factorial moment can be calculated using the compound-Poisson theory as follows: which complete the proof.By replacing r = 1, 2, 3, and 4 in Eq. ( 10), the first four factorial moments of the PEBWED can be derived.
That is, and Now, using the general connection between factorial moments and moments about the origin, the first four moments about the origin of the PEBWED are obtained.We get Therefore, the variance of PEBWED is obtained as The dispersion Index (DI) of the PEBWED is given by To obtain explicit formulations for the skewness and kurtosis of the PEBWED, apply the following equations.
Proposition 4: The probability generating function (pgf) of PEBWED is given by for s 2 ðÀ1; 1Þ: Proof: The pgf of the PEBWED is derived using the well-known compound-Poisson theory in the manner described below which completes the proof.The moment generating function (mgf) and characteristic function (cf) of the PEBWED are obtained from Eq. ( 14) when s is substituted by e t and e it respectively.They are provided, respectively, by for t 0; and for t 2 R: The mean, variance, DI, skewness, and kurtosis for the PEBWED are now shown numerically in Table 1 for various parameter choices.

PARAMETER ESTIMATION
In this section, the model parameter is estimated using the maximum likelihood approach based on complete and censored sampling, censored sampling with cure fraction.This section also covers parameter estimation using the Bayesian approach.

ML estimation based on complete data
Let X 1 ; X 2 ; . . .; X n be a random sample obtained from a PEBWE distribution.The loglikelihood function is defined as follows For the maximum likelihood (ML) estimator of the parameter, differentiate Eq. ( 17) for β Equating Eq. ( 18) to zero and solving for yields the ML estimator.The resultant expression has no closed-form solution, implying that numerical methods are required to get the ML estimate of the parameter.

ML estimation based on censored data
Given a random sample x i ; d i ð Þof size n; i ¼ 1; . . .; n, the ith individual's involvement to the likelihood function is given by where d i is a censoring indicator variable; it is equal to one for the survival time that was observed and zero for one that was right-censored.The likelihood function for the model parameter is provided by when the data have a PEBWE distribution.
The corresponding loglikelihood function is We have derived the log-likelihood function about β When we set Eq. ( 21) to zero, we have the scoring equation that corresponds, and its numerical solution yields the ML estimator.

ML estimation based on censored data and a cure fraction
Survival analysis reveals that a subset of people seems to be impervious to the occurrence of the important event.In clinical trials, some patients who react to the treatment may experience prolonged symptom relief or perhaps a complete recovery.The conventional mixing model's survival function is provided by where g 2 0; 1 ð Þ is the proportion of immunes or cure fraction, and S 0 x ð Þ is a baseline survival function for vulnerable persons.Given a random sample x i ; d i ð Þ of size n; i ¼ 1; . . .; n, the i th subject's contribution to the likelihood function is given by where f 0 x ð Þ is the susceptible individuals' baseline pdf and d i is a censoring indicator variable.The likelihood and log-likelihood functions for parameter β are given below.
After differentiating the log-likelihood function for parameters and setting the resultant derivatives to zero, the ML estimators are generated by solving the appropriate equations.

Bayesian estimation
The Bayesian approach has become the most extensively utilized technique in a range of domains, including but not limited to numerous applications.It is especially helpful in engineering, reliability, health sciences, epidemiology, and quality studies due to its capacity to incorporate prior information into the study.So, under this approach, a prior distribution must be assigned to each parameter.For the PEBWE distribution, we can consider the gamma distribution as the prior distribution for the parameter b and the beta distribution for the cure fraction parameter g.The density functions for the gamma and beta distributions are where s 1 ; k 1 ; s 2 ; k 2 are the hyperparameters.
The joint posterior expression is gained by multiplying the likelihood function given in Eq. ( 17) by the prior distribution densities.To simulate the sample from the posterior density, we utilized the Markov chain Monte Carlo (MCMC) procedures as Gibs sampling.We generate 1,006,000 samples for each denomination of parameter.The first 6,000 simulated samples were eliminated as part of a burn-in phase, which is often used to reduce the influence of starting values.The parameter Bayesian estimates were obtained as the mean of samples specified from the joint posterior distribution.Traceplots and the Geweke diagnostic were used to monitor the convergence of the simulated samples.Further, the highest posterior density (HPD) interval of 95% was obtained using the simulated posterior distributions.

Simulation
Here, we conduct a comprehensive simulation analysis to assess the maximum likelihood estimation approach using complete data.Random samples of the PEBWE distribution of sizes (n) 10, 20, 50, 100, and 200 were used considering different values of the parameter (β).All simulation results were based on N = 10,000 replications for the different sample sizes considered for each parameter setting.Table 2 shows the results of the average estimates, absolute bias (AB), mean relative error (MRE), and mean square error (MSE) of all parameter values.

APPLICATION
In this section, the new model is applied to three over-dispersed and asymmetric, and right-skewed datasets.We compare the fits of PEBWE distribution with Poisson Ailamujia (PA), discrete Burr Hatke (DBH), discrete inverted Topp-Leone (DITL), discrete moment exponential (DME), and Poisson distributions.Different model selection and goodness-offit criteria, log-likelihood (L), Akaike information criteria (AIC), Bayesian information criteria (BIC), and Kolmogorov-Smirnov tests are used to compare the fitted models.
Data II: The second dataset below is remission times (in weeks) for a group of 30 patients with leukemia who received similar treatment (Lawless, 2011).The data observations are; 1, 1, 2, 4, 4, 6, 6, 6, 7, 8, 9, 9, 10, 12, 13, 14, 18, 19, 24, 26, 29, 31+, 42, 45+, 50+, 57, 60, 71+, 85+, 91.The observations with "+" indicate censored times.Using the methodology outlined in "ML estimation based on censored data", we compute the MLEs.Table 4 shows the ML estimate and goodness of fit metrics.Figure 5 shows a comparison of the PP plots for the model based on the PEBWE distribution and the competitive discrete  We acquire the ML estimations using the approach described in "ML estimation based on censored data and a cure fraction".Table 5 shows the ML estimate and goodness of fit metrics.The PP plots based on all competitive distributions are given in Fig. 7.We can see that the results from the PEBWE distribution provide the best fit.Similar to the previous example, for the Bayesian estimation, we utilized gamma and beta distribution as prior for b and g parameters.The means of posterior density for the parameters are b ¼ 0:083 with a 95% HPD interval (0.0301-0.1401) and â ¼ 0:6363 with a 95% interval (0.4131-0.8434).The posterior samples for the parameter are presented in Fig. 8.The ACF (autocorrelation function) indicates that the posterior samples are independent, and the traceplot demonstrates the appraisal of MCMC samples over the iterations.The Geweke z-score (0.6607) is also indicative of satisfactory convergence of drawn samples to a stable distribution.

Table 1
Some computational measures of the PEBWED.

Table 2
Simulation results based on complete data.

Table 3
The MLEs and model selection measures for the first dataset.

Table 4
The MLEs and model selection measures for the second dataset.

Table 5
The MLEs and model selection measures for the third dataset.